-
Notifications
You must be signed in to change notification settings - Fork 847
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mark typed buffer APIs safe
(#996) (#1027)
#1866
Conversation
@@ -299,7 +296,7 @@ impl MutableBuffer { | |||
/// assert_eq!(buffer.len(), 8) // u32 has 4 bytes | |||
/// ``` | |||
#[inline] | |||
pub fn extend_from_slice<T: ToByteSlice>(&mut self, items: &[T]) { | |||
pub fn extend_from_slice<T: ArrowNativeType>(&mut self, items: &[T]) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method was potentially unsound, as ToByteSlice is not sealed and so could theoretically be implemented for a type that is not trivially transmutable (which the implementation of this method implicitly assumes).
Edit: this is an API change
@@ -576,7 +576,7 @@ macro_rules! def_get_binary_array_fn { | |||
fn $name(array: &$ty) -> Vec<ByteArray> { | |||
let mut byte_array = ByteArray::new(); | |||
let ptr = crate::util::memory::ByteBufferPtr::new( | |||
unsafe { array.value_data().typed_data::<u8>() }.to_vec(), | |||
array.value_data().as_slice().to_vec(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know why this was ever using typed_data...
Clippy fix in #1868 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Agree that this should be safe.
arrow/src/buffer/immutable.rs
Outdated
/// correctly for type `T`. | ||
pub fn typed_data<T: ArrowNativeType>(&self) -> &[T] { | ||
// SAFETY | ||
// ArrowNativeType are trivially transmutable, and this method checks alignment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// ArrowNativeType are trivially transmutable, and this method checks alignment | |
// ArrowNativeType is sealed (can't be implemented outside the arrow crate, | |
// trivially transmutable, and this method checks alignment |
let (prefix, offsets, suffix) = self.as_slice().align_to::<T>(); | ||
/// This function panics if the underlying buffer is not aligned | ||
/// correctly for type `T`. | ||
pub fn typed_data<T: ArrowNativeType>(&self) -> &[T] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this is truly "safe" -- is it really true that any bit pattern is a valid ArrowNativeType
? I am thinking about floating point representations in particular -- I wonder if this API could potentially create invalid f32
/ f64
which seems like it would thus still be unsafe
🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think https://doc.rust-lang.org/std/primitive.f32.html#method.from_bits is relevant here, the short answer is it is perfectly safe to transmute arbitrary bytes to floats, it may not be wise, but it is not UB.
In particular the standard library provides safe functions that transmute u32 -> f32, u64 -> f64, and so I think it is fair to say all bit sequences are valid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that it is safe because there are no actual undefined bit patterns in any of the native types (as opposed to bool
or Option<...>
for example). Certain bit patterns might get canonicalized when interpreted as floating point values, but I don't think that would be considered undefined behavior. There are more details about specific behavior in the docs for f64::from_bits (which is considered safe).
Codecov Report
@@ Coverage Diff @@
## master #1866 +/- ##
==========================================
- Coverage 83.48% 83.47% -0.01%
==========================================
Files 201 201
Lines 57000 57001 +1
==========================================
- Hits 47584 47583 -1
- Misses 9416 9418 +2
Continue to review full report at Codecov.
|
safe
(#996) (#1027)
Which issue does this PR close?
Closes #996
Closes #1027
Rationale for this change
These APIs were originally marked unsafe because
ArrowNativeType
could be implemented outside the crate. Since #1819 this is no longer the case, as a result these methods are no longer unsafe as they can't lead to UB.What changes are included in this PR?
Removes
unsafe
fromMutableBuffer::typed_data_mut
andBuffer::typed_data
.Are there any user-facing changes?
No